-
Notifications
You must be signed in to change notification settings - Fork 734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dedicated phase waiting traffic increase #1252
base: main
Are you sure you want to change the base?
Add dedicated phase waiting traffic increase #1252
Conversation
7a0d9a5
to
5ec4f87
Compare
Codecov Report
@@ Coverage Diff @@
## main #1252 +/- ##
==========================================
- Coverage 54.74% 54.69% -0.05%
==========================================
Files 81 81
Lines 6949 6966 +17
==========================================
+ Hits 3804 3810 +6
- Misses 2550 2561 +11
Partials 595 595
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
5ec4f87
to
7a4edef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR! :)
Please sign off your commit: https://stackoverflow.com/questions/1962094/what-is-the-sign-off-feature-in-git-for
Thanks for the review so far, I signed off my first commit. Then in the review I committed the suggested change through GitHub, and it turned out it's missing the signed-off-by. I will add the missing signed-off-by message and rebase this after the review is completed because it requires force push and it would restart the review. If you think otherwise, please let me know. |
That's not an issue, please rebase and force push. |
This is to fix fluxcd#1248 where canary is restarted to "Starting canary analysis" on confirm traffic increase failure. Signed-off-by: Andy Librian <andylibrian@gmail.com>
b2238ef
to
eb276db
Compare
Hi @aryan9600, I have rebased and force push this branch as suggested. Do you have any more comment / feedback? |
I'm currently testing this with various Canary objects, to make sure that adding this new phase doesn't break any existing functionality. If you want, you can do some testing as well, to make sure no bugs creep in inadvertently, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work okay when the confirm-traffic-increase
hook returns successfully at the very start of the analysis and then returns an error later during the analysis. But it doesn't work as expected if the confirm-traffic-increase
hook returns an error at the very first iteration.
In this case, we never progress the canary forward (since the hook returned an error) and set the Canary's phase to WaitingTrafficIncrease
while the canary weight is still 0, so in the next iterations, we just keep running the metric checks, which return an error since no traffic has reached the canary yet.
flagger/pkg/controller/scheduler.go
Line 413 in b2dc762
if ok := c.runAnalysis(cd); !ok { |
Now, if after a few iterations, the confirm-traffic-increase
hook returns successfully, the analysis doesn't proceed as expected. This is due to the fact that instead of progressing the canary forward and consequently increasing the canary weight up from zero, we keep running the metric checks, which again return an error because no traffic reached the canary ever, leading to:
{"level":"info","ts":"2022-08-24T23:20:27.962+0530","caller":"controller/events.go:45","msg":"Halt podinfo.test advancement waiting for traffic increase approval automata","canary":"podinfo.test"}
{"level":"info","ts":"2022-08-24T23:20:42.948+0530","caller":"controller/events.go:45","msg":"Halt advancement no values found for contour metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found","canary":"podinfo.test"}
{"level":"info","ts":"2022-08-24T23:20:57.948+0530","caller":"controller/events.go:45","msg":"Halt advancement no values found for contour metric request-success-rate probably podinfo.test is not receiving traffic: running query failed: no values found","canary":"podinfo.test"}
Steps to repro:
- Create this Canary:
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: podinfo
spec:
analysis:
interval: 15s
maxWeight: 50
metrics:
- interval: 1m
name: request-success-rate
threshold: 99
- interval: 1m
name: request-duration
threshold: 500
stepWeight: 10
threshold: 15
webhooks:
- name: automata
timeout: 5s
type: confirm-traffic-increase
url: http://<flagger-loadtester-url>/gate/check
- metadata:
cmd: hey -z 1m -q 5 -c 2 -host app.example.com http://envoy.projectcontour
logCmdOutput: "true"
type: cmd
name: load-test
timeout: 5s
url: http://<flagger-loadtster-url>
progressDeadlineSeconds: 60
provider: contour
service:
port: 80
targetPort: 9898
targetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
- Close the gate:
curl -v -X POST --data '{"type": "confirm-traffic-increase", "name": "podinfo", "namespace": "test", "phase": "Initialized"}' http://<flagger-loadtester-url>/gate/close
- Kick off the analysis:
kubectl -n test set image deploy/podinfo podinfod==ghcr.io/stefanprodan/podinfo:6.0.1
- Wait for the first couple of iterations and then open the gate:
curl -v -X POST --data '{"type": "confirm-traffic-increase", "name": "podinfo", "namespace": "test", "phase": "WaitingTrafficIncrease"}' http://<flagger-loadtester-url>/gate/open
Please let me know if I'm unclear and you want me to elaborate further :)
This is to fix #1248 where canary is restarted to "Starting canary analysis" on confirm traffic increase failure.
Signed-off-by: Andy Librian andylibrian@gmail.com